Abstract:Large recommendation models have demonstrated substantial potential gains under scaling laws, yet these gains are difficult to realize in industrial recommendation systems because real-world deployment requires lightweight models with strict serving efficiency and latency guarantees. This creates a fundamental gap between offline model scaling and online deployment. In this work, we present Rec-Distill, an industrial distillation pipeline that transfers the performance gains of large-scale recommendation modeling to efficient serving models. Rec-Distill combines large-teacher scaling with student-side transfer optimization through decoupled training, black-box distillation, debiasing mechanism, and a hybrid batch-streaming pipeline for dynamic recommendation environments. Across multiple recommendation and advertising scenarios on real-world platforms, our framework scales teacher models up to 24B dense parameters and 20K behavior sequence length, while enabling lightweight students to recover a substantial portion of teacher gains, with distillation transferability exceeding 60% in the best setting. Extensive offline and online experiments further show that these transferred gains consistently translate into measurable business improvements under industrial constraints. These results demonstrate that Rec-Distill provides a practical framework for distilling large-scale recommendation models into deployable, cost-efficient serving systems, while also establishing a reliable path toward scaling recommendation models to even larger regimes in the future.
Abstract:Although sophisticated sequence modeling paradigms have achieved remarkable success in recommender systems, the information capacity of hand-crafted sequential features constrains the performance upper bound. To better enhance user experience by encoding historical interaction patterns, this paper presents a novel two-stage sequence modeling framework termed Instance-As-Token (IAT). The first stage of IAT compresses all features of each historical interaction instance into a unified instance embedding, which encodes the interaction characteristics in a compact yet informative token. Both temporal-order and user-order compression schemes are proposed, with the latter better aligning with the demands of downstream sequence modeling. The second stage involves the downstream task fetching fixed-length compressed instance tokens via timestamps and adopting standard sequence modeling approaches to learn long-range preferences patterns. Extensive experiments demonstrate that IAT significantly outperforms state-of-the-art methods and exhibits superior in-domain and cross-domain transferability. IAT has been successfully deployed in real-world industrial recommender systems, including e-commerce advertising, shopping mall marketing, and live-streaming e-commerce, delivering substantial improvements in key business metrics.